Monocular Localization of a moving person onboard a Quadrotor MAV

Hyon Lim1  Sudipta N. Sinha2 
1Seoul National University, South Korea
2Microsoft Research Redmond, USA 

ICRA 2015

       
Abstract
 
In this paper, we propose a novel method to recover the 3D trajectory of a moving person from a monocular camera mounted on a quadrotor micro aerial vehicle (MAV). The key contribution is an integrated approach that simultaneously performs visual odometry (VO) and persistent tracking of a person automatically detected in the scene. All computation pertaining to VO, detection and tracking runs onboard the MAV from a front-facing monocular RGB camera. Given the gravity direction from an inertial sensor and the knowledge of the individual's height, a complete 3D trajectory of the person within the reconstructed scene can be estimated. When the ground plane is detected from the triangulated 3D points, the absolute metric scale of the trajectory and the 3D map is also recovered. Our extensive indoor and outdoor experiments show that the system can localize a person moving naturally within a large area. The system runs at 17 frames per second on the onboard computer. A walking person was successfully tracked for two minutes and an accurate trajectory was recovered over a distance of 140 meters with our system running onboard.


paper
[PDF] [Bibtex]
Dataset download
 
About the dataset

This dataset contains image sequences, AHRS data, ground truth bounding box annotations (GTBB) of a person observed from the camera onboard the MAV.
We provide the following sequences used in the evaluation reported in the paper.

mfly-o6       [download]
mfly-o7       [download]
mfly-o8       [download]
mfly-o9       [download]
walk-i4       [download]
walk-i5       [download]
walk-o1       [download]
walk-o2       [download]
walk-o3       [download]

File format

Each directory contains the following files.

gtbb.txt
imu.txt
[frame_number].png

Ground truth bounding box annotations

Each line of the gtbb.txt file stores the 2d bounding box position -- [left, top, width, height] for each frame. Here is an example.

252,203,56,137

This indicates that the top-left corner of the bounding box is at pixel (252,203) and the width and height of the bounding box is 56 and 137 pixels respectively. The line number indicates the frame number.

IMU information

Each line of the imu.txt file contains the following information.

[frame number] , [time of frame captured / millisecond] , [time of IMU captured / millisecond], [roll / degree], [pitch / degree], [yaw / degree]

The frame number is directly associated in one-to-one fashion with the image files.